Skip to content

BUG: Propagate Python exceptions from c_is_list_like (#33721) #33723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 27, 2020

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Apr 22, 2020

@@ -1 +1 @@
cdef bint c_is_list_like(object, bint)
cdef bint c_is_list_like(object, bint) except -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm don't think a bint can ever return -1. The only thing I think you could except here is NULL but would really have to refactor to do that

Is there a particular issue you are trying to solve?

Copy link
Contributor Author

@shwina shwina Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing.

The ignored exception led to a segfault in https://github.com/rapidsai/cudf that I'm unable to reproduce at the moment since we've since refactored our code a bit -- apologies.

However, we can demonstrate the issue by raising artificially within c_is_list_like -- I believe it's valid to assume that Python exceptions can originate from within this function.

  1. Without the except -1:
cdef inline bint c_is_list_like(object obj, bint allow_sets):
    raise ZeroDivisionError()
    return (
        isinstance(obj, abc.Iterable)
        # we do not count strings/unicode/bytes as list-like
        and not isinstance(obj, (str, bytes))
        # exclude zero-dimensional numpy arrays, effectively scalars
        and not (util.is_array(obj) and obj.ndim == 0)
        # exclude sets if allow_sets is False
        and not (allow_sets is False and isinstance(obj, abc.Set))
    )
>>> import pandas as pd
>>> pd.api.types.is_list_like([])
ZeroDivisionError
Exception ignored in: 'pandas._libs.lib.c_is_list_like'
ZeroDivisionError:
False
  1. With the except -1:
cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
    raise ZeroDivisionError()
    return (
        isinstance(obj, abc.Iterable)
        # we do not count strings/unicode/bytes as list-like
        and not isinstance(obj, (str, bytes))
        # exclude zero-dimensional numpy arrays, effectively scalars
        and not (util.is_array(obj) and obj.ndim == 0)
        # exclude sets if allow_sets is False
        and not (allow_sets is False and isinstance(obj, abc.Set))
    )
>>> import pandas as pd
>>>pd.api.types.is_list_like([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/lib.pyx", line 985, in pandas._libs.lib.is_list_like
    return c_is_list_like(obj, allow_sets)
  File "pandas/_libs/lib.pyx", line 989, in pandas._libs.lib.c_is_list_like
    raise ZeroDivisionError()
ZeroDivisionError

(-1 seems to be a valid value for bint)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignored exception led to a segfault

Any idea what value was being passed? Its hard to imagine what could cause this to raise unless you specifically rigged it to make isinstance checks fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha - so after digging up some old code, it looks the source of the problem was a RecursionError, which happened to be thrown within isinstance here.

Copy link
Contributor Author

@shwina shwina Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal repro (segfaults for me):

In [1]: import pandas as pd

In [2]: def foo():
   ...:     pd.api.types.is_list_like("test")
   ...:     foo()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is appropriate here as it signals cython that there maybe is an exception to check.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u want to add the test ok or can just push this

@@ -659,6 +659,8 @@ Other
- Bug in :meth:`Series.map` not raising on invalid ``na_action`` (:issue:`32815`)
- Bug in :meth:`DataFrame.__dir__` caused a segfault when using unicode surrogates in a column name (:issue:`25509`)
- Bug in :meth:`DataFrame.plot.scatter` caused an error when plotting variable marker sizes (:issue:`32904`)
- Propagate Python exceptions from :func:`_lib.c_is_list_like` (#33721)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t need the whats new as this is pretty obscure and not user facing

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is the right solution. As is this function would never return -1 so it's a little confusing to declare that a return value of -1 should propogate errors.

Do you know within this function where the segfault actually happens? I think the issue is probably with util.is_array if anything

@WillAyd
Copy link
Member

WillAyd commented Apr 23, 2020

May also be an issue that is_array is defined in pandas/_libs/tslibs/util.pxd but this seems to try and access from pandas/_libs/util.pxd

@jbrockmendel
Copy link
Member

May also be an issue that is_array is defined in pandas/_libs/tslibs/util.pxd but this seems to try and access from pandas/_libs/util.pxd

Out of skepticism I tried changing the cimport and it didnt change the segfault.

@shwina
Copy link
Contributor Author

shwina commented Apr 23, 2020

I still don't think this is the right solution. As is this function would never return -1 so it's a little confusing to declare that a return value of -1 should propogate errors.

The fact that the function will never return -1 is exactly why it is used to signal exceptions. Here are the relevant Cython docs:

https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values

This pattern is also used in other places in lib.pyx:

https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/lib.pyx#L1096

@shwina
Copy link
Contributor Author

shwina commented Apr 23, 2020

Do you know within this function where the segfault actually happens? I think the issue is probably with util.is_array if anything

In this particular case, a segfault happens because a RecursionError is thrown from within the function (because the stack happens to be exhausted there), but is ignored.

@jbrockmendel
Copy link
Member

The whatsnew can be removed, then LGTM

@@ -1 +1 @@
cdef bint c_is_list_like(object, bint)
cdef bint c_is_list_like(object, bint) except -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is appropriate here as it signals cython that there maybe is an exception to check.

@jreback jreback added the Bug label Apr 23, 2020
@WillAyd
Copy link
Member

WillAyd commented Apr 23, 2020 via email

@jreback jreback added this to the 1.1 milestone Apr 23, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwina if you can add the test as proposed would be great (the recursion one); ping on green.

@shwina
Copy link
Contributor Author

shwina commented Apr 23, 2020

Thanks -- will do. Any suggestions for where to put that test?

@jreback
Copy link
Contributor

jreback commented Apr 23, 2020

Thanks -- will do. Any suggestions for where to put that test?

pandas/tests/dtypes/test_inference.py

@shwina
Copy link
Contributor Author

shwina commented Apr 24, 2020

I'm now having second thoughts about the robustness of this test: it assumes that the RecursionError will be raised from within c_is_list_like, but it could as well be raised from anywhere else within is_list_like(). Granted, for now, is_list_like() just calls c_is_list_like, but that could conceivably change in the future.

Any thoughts here?

@jbrockmendel
Copy link
Member

Any thoughts here?

what if got rid of c_is_list_like and just made is_list_like cpdef?

@shwina
Copy link
Contributor Author

shwina commented Apr 24, 2020

That would solve this problem. But it looks like c_is_list_like is used elsewhere within the Cython (e.g., https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/reshape.pyx#L124).

Not sure whether the overhead that will be introduced by calling a cpdef method instead is something to worry about.

@shwina
Copy link
Contributor Author

shwina commented Apr 24, 2020

I think that having a fast cdef method is valuable here.

@WillAyd
Copy link
Member

WillAyd commented Apr 24, 2020

@shwina can you post the test that you have for now? Will be easier for review and filing comments

@jreback jreback added the Segfault Non-Recoverable Error label Apr 27, 2020
@jreback
Copy link
Contributor

jreback commented Apr 27, 2020

lgtm can u merge master and resolve conflicts and ping in green

@shwina
Copy link
Contributor Author

shwina commented Apr 27, 2020

Thanks @jreback - done.

@jreback jreback merged commit b828d74 into pandas-dev:master Apr 27, 2020
@jreback
Copy link
Contributor

jreback commented Apr 27, 2020

thanks @shwina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Segfault Non-Recoverable Error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Exceptions not propagated from c_is_list_like in lib.pyx
4 participants